AITopics | fine-grained classification

Collaborating Authors

fine-grained classification

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Maximum-Entropy Fine Grained Classification

Abhimanyu Dubey, Otkrist Gupta, Ramesh Raskar, Nikhil Naik

Neural Information Processing SystemsFeb-12-2026, 05:31:13 GMT

Neural Information Processing Systems http://nips.cc/

classification, dataset, entropy, (12 more...)

Neural Information Processing Systems

Country:

North America > United States > Massachusetts > Middlesex County > Cambridge (0.14)
North America > Canada > Quebec > Montreal (0.04)
Europe > United Kingdom > England > North Yorkshire > York (0.04)
Asia > China > Anhui Province > Hefei (0.04)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Maximum Entropy (0.47)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.46)

Add feedback

Maximum-Entropy Fine Grained Classification

Abhimanyu Dubey, Otkrist Gupta, Ramesh Raskar, Nikhil Naik

Neural Information Processing SystemsNov-20-2025, 14:21:52 GMT

Fine-Grained Visual Classification (FGVC) is an important computer vision problem that involves small diversity within the different classes, and often requires expert annotators to collect data.

artificial intelligence, entropy, machine learning, (14 more...)

Neural Information Processing Systems

Country:

North America > United States > Massachusetts > Middlesex County > Cambridge (0.14)
North America > Canada > Quebec > Montreal (0.04)
Asia > China > Anhui Province > Hefei (0.04)

Industry: Health & Medicine (0.34)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Maximum Entropy (0.47)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.46)

Add feedback

From Exploration to Exploitation: A Two-Stage Entropy RLVR Approach for Noise-Tolerant MLLM Training

Xu, Donglai, Yang, Hongzheng, Zhao, Yuzhi, Zhang, Pingping, Chen, Jinpeng, Ma, Wenao, Hou, Zhijian, Wu, Mengyang, Li, Xiaolei, Hu, Senkang, Guan, Ziyi, Li, Jason Chun Lok, Po, Lai Man

arXiv.org Artificial IntelligenceNov-12-2025

Reinforcement Learning with Verifiable Rewards (RLVR) for Multimodal Large Language Models (MLLMs) is highly dependent on high-quality labeled data, which is often scarce and prone to substantial annotation noise in real-world scenarios. Existing unsupervised RLVR methods, including pure entropy minimization, can overfit to incorrect labels and limit the crucial reward ranking signal for Group-Relative Policy Optimization (GRPO). To address these challenges and enhance noise tolerance, we propose a novel two-stage, token-level entropy optimization method for RLVR. This approach dynamically guides the model from exploration to exploitation during training. In the initial exploration phase, token-level entropy maximization promotes diverse and stochastic output generation, serving as a strong regularizer that prevents premature convergence to noisy labels and ensures sufficient intra-group variation, which enables more reliable reward gradient estimation in GRPO. As training progresses, the method transitions into the exploitation phase, where token-level entropy minimization encourages the model to produce confident and deterministic outputs, thereby consolidating acquired knowledge and refining prediction accuracy. Empirically, across three MLLM backbones - Qwen2-VL-2B, Qwen2-VL-7B, and Qwen2.5-VL-3B - spanning diverse noise settings and multiple tasks, our phased strategy consistently outperforms prior approaches by unifying and enhancing external, internal, and entropy-based methods, delivering robust and superior performance across the board.

grpow, large language model, machine learning, (17 more...)

arXiv.org Artificial Intelligence

2511.07738

Country: Europe > Switzerland (0.28)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.50)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.50)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.34)

Add feedback

Fine-Grained Classification: Connecting Metadata via Cross-Contrastive Pre-Training

Mamtani, Sumit, Thesia, Yash

arXiv.org Artificial IntelligenceOct-21-2025

Abstract--Fine-grained visual classification aims to recognize objects belonging to many subordinate categories of a supercat-egory, where appearance alone often fails to distinguish highly similar classes. We propose a unified framework that integrates image, text, and metadata via cross-contrastive pre-training. We first align the three modality encoders in a shared embedding space and then fine-tune the image and metadata encoders for classification. On NABirds [1], our approach improves over the baseline by 7.83% and achieves 84.44% top-1 accuracy, outperforming strong multimodal methods. The challenge is that inter-class differences are subtle while intra-class variation (pose, background, lighting) can be large.

artificial intelligence, machine learning, metadata, (13 more...)

arXiv.org Artificial Intelligence

2504.20322

Genre: Research Report (0.66)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.69)

Add feedback

Advancing Fine-Grained Classification by Structure and Subject Preserving Augmentation

Neural Information Processing SystemsMay-26-2025, 19:18:10 GMT

Fine-grained visual classification (FGVC) involves classifying closely related subcategories. This task is inherently difficult due to the subtle differences between classes and the high intra-class variance. Moreover, FGVC datasets are typically small and challenging to gather, thus highlighting a significant need for effective data augmentation.Recent advancements in text-to-image diffusion models have introduced new possibilities for data augmentation in image classification. While these models have been used to generate training data for classification tasks, their effectiveness in full-dataset training of FGVC models remains under-explored. Recent techniques that rely on text-to-image generation or Img2Img methods, such as SDEdit, often struggle to generate images that accurately represent the class while modifying them to a degree that significantly increases the dataset's diversity.

artificial intelligence, machine learning, structure and subject preserving augmentation, (7 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Accessing Vision Foundation Models at ImageNet-level Costs

Zhang, Yitian, Ma, Xu, Bai, Yue, Wang, Huan, Fu, Yun

arXiv.org Artificial IntelligenceJul-14-2024

Vision foundation models are renowned for their generalization ability due to massive training data. Nevertheless, they demand tremendous training resources, and the training data is often inaccessible, e.g., CLIP, DINOv2, posing great challenges to developing derivatives that could advance research in this field. In this work, we offer a very simple and general solution, named Proteus, to distill foundation models into smaller equivalents on ImageNet-1K without access to the original training data. Specifically, we remove the designs from conventional knowledge distillation settings that result in dataset bias and present three levels of training objectives, i.e., token, patch, and feature, to maximize the efficacy of knowledge transfer. In this manner, Proteus is trained at ImageNet-level costs with surprising ability, facilitating the accessibility of training foundation models for the broader research community. Leveraging DINOv2-g/14 as the teacher, Proteus-L/14 matches the performance of the Oracle method DINOv2-L/14 (142M training data) across 15 benchmarks and outperforms other vision foundation models including CLIP-L/14 (400M), OpenCLIP-L/14 (400M/2B) and SynCLR-L/14 (600M). Code is available at here.

dataset, foundation model, proteus, (14 more...)

arXiv.org Artificial Intelligence

2407.10366

Genre:

Instructional Material (0.54)
Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.68)
Information Technology > Artificial Intelligence > Vision > Image Understanding (0.49)
Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (0.48)

Add feedback

Fine-Grained Classification for Poisonous Fungi Identification with Transfer Learning

Chiu, Christopher, Heil, Maximilian, Kim, Teresa, Miyaguchi, Anthony

arXiv.org Artificial IntelligenceJul-10-2024

FungiCLEF 2024 addresses the fine-grained visual categorization (FGVC) of fungi species, with a focus on identifying poisonous species. This task is challenging due to the size and class imbalance of the dataset, subtle inter-class variations, and significant intra-class variability amongst samples. In this paper, we document our approach in tackling this challenge through the use of ensemble classifier heads on pre-computed image embeddings. Our team (DS@GT) demonstrate that state-of-the-art self-supervised vision models can be utilized as robust feature extractors for downstream application of computer vision tasks without the need for task-specific fine-tuning on the vision backbone. Our approach achieved the best Track 3 score (0.345), accuracy (78.4%) and macro-F1 (0.577) on the private test set in post competition evaluation. Our code is available at https://github.com/dsgt-kaggle-clef/fungiclef-2024.

fine-grained classification, poisonous fungus identification, transfer learning

arXiv.org Artificial Intelligence

2407.07492

Genre: Research Report (0.69)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Transfer Learning (0.40)

Add feedback

Performance of computer vision algorithms for fine-grained classification using crowdsourced insect images

Pucci, Rita, Kalkman, Vincent J., Stowell, Dan

arXiv.org Artificial IntelligenceApr-4-2024

With fine-grained classification, we identify unique characteristics to distinguish among classes of the same super-class. We are focusing on species recognition in Insecta, as they are critical for biodiversity monitoring and at the base of many ecosystems. With citizen science campaigns, billions of images are collected in the wild. Once these are labelled, experts can use them to create distribution maps. However, the labelling process is time-consuming, which is where computer vision comes in. The field of computer vision offers a wide range of algorithms, each with its strengths and weaknesses; how do we identify the algorithm that is in line with our application? To answer this question, we provide a full and detailed evaluation of nine algorithms among deep convolutional networks (CNN), vision transformers (ViT), and locality-based vision transformers (LBVT) on 4 different aspects: classification performance, embedding quality, computational cost, and gradient activity. We offer insights that we haven't yet had in this domain proving to which extent these algorithms solve the fine-grained tasks in Insecta. We found that the ViT performs the best on inference speed and computational cost while the LBVT outperforms the others on performance and embedding quality; the CNN provide a trade-off among the metrics.

classification, coleoptera ob, dataset, (15 more...)

arXiv.org Artificial Intelligence

2404.03474

Country:

North America > United States > New Jersey > Middlesex County > New Brunswick (0.04)
Europe > Netherlands > South Holland > Leiden (0.04)
Asia (0.04)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Communications > Social Media > Crowdsourcing (0.70)

Add feedback

Comparison between transformers and convolutional models for fine-grained classification of insects

Pucci, Rita, Kalkman, Vincent J., Stowell, Dan

arXiv.org Artificial IntelligenceJul-20-2023

Fine-grained classification is challenging due to the difficulty of finding discriminatory features. This problem is exacerbated when applied to identifying species within the same taxonomical class. This is because species are often sharing morphological characteristics that make them difficult to differentiate. We consider the taxonomical class of Insecta. The identification of insects is essential in biodiversity monitoring as they are one of the inhabitants at the base of many ecosystems. Citizen science is doing brilliant work of collecting images of insects in the wild giving the possibility to experts to create improved distribution maps in all countries. We have billions of images that need to be automatically classified and deep neural network algorithms are one of the main techniques explored for fine-grained tasks. At the SOTA, the field of deep learning algorithms is extremely fruitful, so how to identify the algorithm to use? We focus on Odonata and Coleoptera orders, and we propose an initial comparative study to analyse the two best-known layer structures for computer vision: transformer and convolutional layers. We compare the performance of T2TViT, a fully transformer-base, EfficientNet, a fully convolutional-base, and ViTAE, a hybrid. We analyse the performance of the three models in identical conditions evaluating the performance per species, per morph together with sex, the inference time, and the overall performance with unbalanced datasets of images from smartphones. Although we observe high performances with all three families of models, our analysis shows that the hybrid model outperforms the fully convolutional-base and fully transformer-base models on accuracy performance and the fully transformer-base model outperforms the others on inference speed and, these prove the transformer to be robust to the shortage of samples and to be faster at inference time.

artificial intelligence, deep learning, machine learning, (19 more...)

arXiv.org Artificial Intelligence

2307.11112

Country:

North America > United States > New Jersey > Middlesex County > New Brunswick (0.04)
Europe > Netherlands > South Holland > Leiden (0.04)
Asia > India (0.04)

Genre: Research Report > New Finding (0.68)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Transfer Learning for Fine-grained Classification Using Semi-supervised Learning and Visual Transformers

Lagunas, Manuel, Impata, Brayan, Martinez, Victor, Fernandez, Virginia, Georgakis, Christos, Braun, Sofia, Bertrand, Felipe

arXiv.org Artificial IntelligenceMay-17-2023

Fine-grained classification is a challenging task that involves identifying subtle differences between objects within the same category. This task is particularly challenging in scenarios where data is scarce. Visual transformers (ViT) have recently emerged as a powerful tool for image classification, due to their ability to learn highly expressive representations of visual data using self-attention mechanisms. In this work, we explore Semi-ViT, a ViT model fine tuned using semi-supervised learning techniques, suitable for situations where we have lack of annotated data. This is particularly common in e-commerce, where images are readily available but labels are noisy, nonexistent, or expensive to obtain. Our results demonstrate that Semi-ViT outperforms traditional convolutional neural networks (CNN) and ViTs, even when fine-tuned with limited annotated data. These findings indicate that Semi-ViTs hold significant promise for applications that require precise and fine-grained classification of visual data.

artificial intelligence, inductive learning, machine learning, (16 more...)

arXiv.org Artificial Intelligence

2305.10018

Country:

South America > Brazil (0.05)
North America > Mexico (0.05)
North America > Central America (0.04)
North America > United States > Virginia (0.04)

Genre: Research Report > New Finding (0.54)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.50)

Add feedback